Method for debugging reconfigurable architectures

ABSTRACT

A method for debugging reconfigurable hardware is described. According to this method, all necessary debug information is written in each configuration cycle into a memory, which is then analyzed by the debugger.

FIELD OF THE INVENTION

The present invention relates to methods for debugging programs onreconfigurable architectures.

BACKGROUND INFORMATION

Reconfigurable architecture refers to modules (VPUs) having aconfigurable function and/or interconnection, in particular integratedmodules having a plurality of one-dimensionally or multidimensionallyarranged arithmetic and/or logic and/or analog and/or memory and/orinterconnecting modules (hereinafter referred to as PAEs) and/orcommunicative/peripheral modules (IOs) that are interconnected directlyor via one or more bus systems. PAEs are arranged in any configuration,combination, and hierarchy. This system is referred to below as a PAEarray or PA.

The generic class of such modules includes in particular systolicarrays, neural networks, multiprocessor systems, processors having aplurality of arithmetic units and/or logic cells, interconnection andnetwork modules such as crossbar switches, as well as conventioalmodules of the generic types FPGA, DPGA, XPUTER, etc. In thisconnection, reference is made in particular to the followingapplications of the same applicant: P 44 16 881.0-53, DE 197 81 412.3,DE 197 81 483.2, DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 04044.6-53, DE 198 80 129.7, DE 198 61 088.2-53, DE 199 80 312.9, PCT/DE00/01869, DE 100 36 627.9-33, DE 100 28 397.7, DE 101 10 530.4, DE 10111 014.6, PCT/EP 00/10516, EP 01 102 674.7, DE 102 06 856.9, 60/317,876,DE 102 02 044.2, DE 101 29 237.6-53, DE 101 39 170.6. These are herewithincorporated to the full extent for disclosure purposes.

In addition, it should be pointed out that the methods to be describedhere may be used for groups of multiple modules. Nevertheless, referenceis made below to a VPU and/or to “modules.” These modules and theiroperations are to be further improved.

SUMMARY

An object of the present invention is to provide something novel forcommercial use.

A plurality of variants and hardware implementations (which makeefficient debugging of VPU systems possible) are presented in thefollowing.

1. Example Embodiments

In a preferred variant, debugging is performed either by using amicrocontroller appropriately connected to a VPU or the module or by theload logic according to the patents P 44 16 881.0-53, DE 196 51 075.9,DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 57 200.6-33, DE 198 07872.2, DE 101 39 170.6, DE 199 26 538.0, DE 100 28 397.7, the fullcontent of which is herewith incorporated by this reference. As will beseen, however, other hardware variants may also be used.

The following basic methods may be used alternatively and/or jointlyhere:

-   1.1 Detecting a Debug Condition-   1.1.1 Condition

The programmer defines, e.g., within the debugging tool, one or moreconditions which start debugging (cf. breakpoint according to therelated art). The occurrence of the conditions is detected at run timein the VPU and/or in any device exchanging data with the VPU. Thispreferably takes place due to the occurrence of certain data values withcertain variables and/or certain trigger values with certain PAEs.

1.1.2 Precondition

In the optimum case, a certain condition according to the definitiongiven above may already be defined by the programmer several cyclesbefore the occurrence of the debugging condition. This precludes, fromthe beginning, certain latency problems which are discussed below.

Two fundamental types of debugging for VPUs are discussed below, themethod preferred in each case depending on the choice of the compiler.Method A described below may be particularly suitable for compilerswhich generate code on the basis of instantiated modules of a hardwaredescription language (or a similar language).

For compilers like those described in DE 101 39 170.6 and additionalapplications which generate complex instructions according to a methodlike VLIW, method B described below is particularly suitable. Generally,method B is the method preferred for operation of a VPU or acorresponding module as a processor or coprocessor.

It has been recognized that in particular the use of the two methods Aand B together yields the best and most transparent debugging results.In particular, depending on the depth of the error to be debugged, it ispossible to perform debugging first with the help of fast debuggingmethod B, and then after adequate localization of the error, to analyzethe details in depth by method A.

2. Method A

-   2.1 Basic Principle

After the occurrence of a (pre)condition, the VPU is stopped. Therelevant debug information is then transferred from the PAEs to thedebug program. The relevant debug information has previously beendefined by the programmer within the debug program. After readout of allrelevant debug information, the next cycle is executed and the relevantdebug information is again read out. This is repeated until theprogrammer terminates the debugging operation. Instead of stopping theVPU, other methods are optionally also possible. For a given sequence ofcycles, for example, data may be made available repeatedly for readout,if this is possible rapidly enough.

-   2.2 Support by the Hardware

2.2.1 Readout from the Registers

Essential for the functioning of the debugger is the possibility ofreading back another externally connected (host) processor or a reservedarea of array, the internal data registers, and/or status registers,and/or state registers, and optionally, depending on implementation,other relevant registers and/or signals from the PAEs and/or the networkthrough a higher level unit (referred to below as a debug processor(DB)), i.e., a CT or a load logic, for example, and doing so only forselected registers and/or signals (referred to jointly below as debuginformation). Such a possibility is implementable, for example, with theconnection created in PCT/DE 98/00334 between the load logic and thedata bus of a PAE (PCT/DE 98/00334 0403, FIG. 4).

It should be pointed out explicitly that serial methods for readout ofthe registers may also be used. For example, JTAG may be selected, andthe DB may also be connected via this method and optionally also as aseparate external device, possibly a device that is commonly availableon the market (e.g., from Hitex, Karlsruhe).

Since the debugger may have reading and/or writing access to allregisters or at least a considerable number of them, it is optionallyand preferably possible to omit a significant portion of the (serial)chaining of the registers for test purposes (scan chain) for theproduction tests of the chip. The scan chain is normally used to permitpreloading of test data into all the registers within a chip duringproduction tests and/or to permit the contents of the registers to beread back for test purposes. Preloading and/or reading back thentypically take place through test systems (e.g., SZ Test Systems,Amerang) and/or according to the methods described in DE 197 57200.6-33. The scan chain requires an additional not insignificanthardware complexity and surface area required for each register. Thismay now be eliminated at least for the registers that are debuggable,if, as proposed according to the present invention, production testingsystems have access to the registers via suitable interfaces (e.g.,parallel, serial, JTAG, etc.)

2.2.2 Stopping or Slowing down the Clock Cycle

The clock may either be stopped or slowed down due to the occurrence ofthe condition and/or precondition to make available enough time forreadout. This debug start is triggered in particular either directly bya PAE that has calculated the (pre)condition(s) or by a higher-levelunit (e.g,., load logic/CT, host processor) on the basis of any actions,e.g., due to the information that a pre(condition) has occurred on a PAEand/or due to an action within the debug processor and/or through anyprogram and/or any external/peripheral source. Trigger mechanismsaccording to P 44 16 881.0-53, DE 196 51 075.9-53, DE 197 04 728.9, DE198 07 872.2, DE 198 09 640.2, DE 100 28 397.7 are available forinformation. Alternatively, the clock pulse may be slowed down ingeneral in debugging. If only array parts are to be debugged, a partialslowing down of the clock pulse may also be provided.

If the clock pulse is slowed down, all the relevant debug informationmust be read out of the PAEs by the debug processor within theslowed-down cycle of the processing clock pulse. It is thereforeappropriate and preferable to slow down the clock pulse only partially,i.e., to reduce or stop the working clock pulse but to continue theclock pulse for the readout mechanism. In addition, it is reasonable andpreferable to supply the registers in general with a clock pulse fordata preservation.

After stopping the clock pulse, a single-step mode may be implemented,i.e., the debug processor stops the processing clock pulse until it hasread out all the debug information. It restarts the processing clockpulse for one cycle and then stops it again until all relevant debuginformation has been read out.

The readout clock pulse and the clock pulse of the debug processor arepreferably independent of the processing clock pulse of the PAEs, sothat data processing is separated from debugging and in particular fromreadout of debug information.

In terms of the hardware, the clock pulse is stopped or slowed down byconventional methods, such as gated clocks and/or PLLs, and/or splittersor other methods. These means are preferably introduced at suitablelocations (nodes) within the clock tree so that global clock control ofthe deeper branches is implementable. Slowing down the clock pulse ofonly selected array portions is described in the patent applications ofthe present applicant cited above.

It is particularly preferable for clock control information to be sentfrom a higher level unit, e.g., a load logic/CT, host processor) to allPAEs or to all PAEs that are to be debugged. This may be accomplishedpreferably via the configuration bus system. The clock controlinformation here is typically transmitted by being broadcast, i.e., allPAEs receive the same information.

For example., the following clock control information may beimplemented:

-   STOP: The working clock pulse is stopped.-   SLOW: The working clock pulse is slowed down.-   STEP: One processing step (single-step mode) is executed and then    the working clock pulse is stopped again.-   STEP (n): n processing steps are executed and the working clock    pulse is stopped again.-   GO: The working clock pulse continues normally.

The method for stopping and/or slowing down the clock pulse may also beused to reduce power consumption. If no computing power is needed at themoment, a “sleep mode” may be implemented by switching off the workingclock pulse (STOP), for example, or through special instructions(SLEEP). If the full computing power is not needed, the clock pulse maybe slowed down by using SLOW and/or temporarily suspended by usingSTEP(n). To this extent, this method may be used optionally and/or inaddition to the methods described in German Patent Application No. DE102 06 653.1 for reducing the power loss in particular.

One problem in broadcasting clock control information is thetransmission time of the broadcast through the array of PAEs. At higherclock pulse frequencies, the transmission cannot take place within oneworking clock cycle. However, it is obligatory for all PAEs to respondto the clock control information at the same time. The clock controlinformation is therefore preferably transmitted over a pipelined bussystem similar to the CT bus system described in German PatentApplication No. DE 100 28 397.7. In addition, a numerical value (LATVAL)is appended to the clock control information, this numerical value beingequal to or greater than the maximum length of the pipeline of the bussystem. The numerical value is decremented in cycles in each pipelinestep (subtraction of 1). Each PAE receiving clock control informationalso decrements the numerical value with each clock pulse. This ensuresthat the numerical value in the pipelined bus system and the PAEs thathave already received the clock control information is always exactlythe same. If the numerical value reaches a value or 0, this ensures thatall the PAEs have received the clock control information. The clockcontrol information then goes into effect and the behavior of the clockpulse is modified accordingly.

Another latency time occurs due to the method described here. Thislatency may be additionally supported through the register pipelinewhich is described in greater detail below or, as is particularlypreferred, by the definition of the (pre)condition by setting the(pre)condition forward to the extent that the latency time is alreadytaken into account.

The latency time in the single-step mode is negligible because it playsa role only in the shutdown of the clock pulse (STOP). Since the STEPinstruction always executes only one step, there is no corruption(delay) of the debug data due to the latency time during single-stepoperation.

2.2.3 Register Pipeline for Compensating for Latency

At higher operating frequencies, there may be a latency time betweendetecting the debug start and stopping or slowing down the clock pulse.This latency time is precisely predictable because the position of thedelaying registers in the VPU is defined by the hardware and/or by thealgorithm to be debugged and is therefore exactly calculable by thedebugger.

However, due to the latency time, the information made available to thedebug processor is shifted, so it is no longer possible to read out thecorrect debug information. This problem is preferably solved by asuitable definition of the (pre)condition by the programmer. Byinserting a multistage register pipeline which transmits the debuginformation further by one register in each clock pulse, the debugprocessor is optionally able to use as many cycles of debug informationas the register pipeline is long. The length of the register pipeline isto be designed to correspond to the maximum expected latency. Because ofthe precise calculability of the latency time, the debug program is nowable to read the timely correct and relevant debug information out ofthe register pipeline.

One problem which occurs in using register pipelines is that they arerelatively long and are thus expensive, based on the silicon surfacearea required for implementation.

-   2.3 Visible Debug Information

In this method, debugging is generally performed after occurrence of the(pre)condition because only thereafter is the clock pulse slowed down orstopped and the debug information read out. Debug information prior tooccurrence of the (pre)condition is therefore not visible at first.

However, it is also possible, although this also involves a loss ofperformance, to operate a VPU at a slowed clock pulse or in single-stepmode directly from the start of an application. The relevant debuginformation is then read out by the debug processor from the start.

3. Method B

-   3.1 Basic Principle

Relevant debug information from the memory units, which includes theapplication data and states of a certain working step in accordance withP 44 16 881.0-53, DE 196 54 846.2-53, DE 199 26 538.0, DE 101 39 170.6as well as their additional applications and DE 101 10 530.4, istransmitted to the debug program. These memory units, hereinafter alsoreferred to as working memories, operate more or less as registers forstoring data which has been calculated within a configuration cycle inthe PA or parts of the PA, in the machine model according to P 44 16881.0-53, DE 196 54 846.2-53, DE 101 39 170.6 and their additionalapplications DE 199 26 538.0 and DE 101 10 530.4. Reference is made inparticular to German Patent Application No. DE 101 39 170.6 and itsadditional applications which describe in detail the use of the memoryunits as registers (REG) for implementation of a processor model. Thefull content of DE 101 39 170.6 and its additional applications areherewith included for disclosure purposes. A memory unit here includesany arrangement and hierarchy of independent and dependent memories. Itis possible to execute simultaneously a plurality of differentalgorithms on the PA (processing array), which then use differentmemories.

It is essential for the use of this method that data and/oralgorithmically relevant states are stored in the memory units assignedto the PAEs, one memory unit in each case being of such size that allthe relevant data and/or states of a cycle may be stored there. Thelength of a cycle may be determined by the size of the memory unit,which it preferably actually is (see DE 196 54 846.2-53). In otherwords, the cycle length is adapted to the hardware.

Different data and/or states are stored in the memory units in such away that the latter may be assigned unambiguously to the algorithm. Thedebugger is therefore able to unambiguously identify the relevant dataand/or states (debug information).

The relevant debug information may be determined by the programmerwithin the debug program—in particular also in advance. This debuginformation is read out of the memory units. Different methods areavailable for this, and a few possibilities are discussed in greaterdetail below. After readout of all relevant debug information, the nextconfiguration cycle is executed and the relevant debug information isagain read out. This is repeated until the programmer/debugger abortsthe debugging procedure.

In other words, the relevant data and/or status information is nottransmitted to the debugger in cycles but instead according to theconfiguration. It is read out of the memory units that are comparable tothe registers of the CPU.

-   3.2 Support by the Hardware

For the mode of operation of the debugger, it is essential for the CT oranother externally connected processor (referred to below as the debugprocessor (DB)) to be able to read the internal working memory(memories) of the VPU, for example. Such a possibility is provided, forexample, by connecting the CT to the working memory for preloading andreading the data and/or by the method described in DE 199 26 538.0 forwriting the internal memory to external memories. In one possibleembodiment, the working memory may be accessed by various methods of therelated art (e.g., shared memory, bank switching) by the debugprocessor, so that data exchange with the DB may take place largelyindependently of any other data processing in the VPU.

In one possible embodiment, the clock pulse of the VPU may optionally beeither retarded or stopped for readout of the memory, e.g., according tomethod A by one or more of the measures described above and/or it mayoptionally be operated in a single-step mode. Depending on theimplementation of the working memory, e.g., in the bank switchingmethod, it is possible to eliminate a separate intervention involvingthe clock pulse. The clock pulse is typically stopped or slowed downaccording to method B and the working memories are read out and/orcopied and/or switched only when a data processing or configurationcycle is ended.

In other words, an important advantage of method B is that it does notrequire any particular support by the hardware.

In one possible embodiment, a DB need only have access to the workingmemory. In an example embodiment which is particularly preferred, theworking memory is accessed through a suitable configuration of the VPU,which therefore reads out the working memories automatically and withoutmodification and transmits this information to a DB.

-   3.3 Access to Debug Information

Patents and patent applications P 44 16 881.0-53, DE 196 54 846.2-53, DE101 39 170.69, DE 199 26 538.0 describe data processing methods in whicha set of operations is mapped cyclically onto a reconfigurable dataprocessing module. In each cycle, a plurality of data originating from aperipheral source and/or an internal/external working memory and writtento a peripheral source and/or an internal/external working memory iscalculated. Different working memories and/or in particular a pluralityof independent working memories may be used at the same time. Forexample, in this data processing method, the working memories or some ofthe working memories function as register sets.

According to DE 101 39 170.6 and DE 199 26 538.0, all data and statesrelevant for further data processing are stored in the working memoryand/or read out of same. In a preferred method, states irrelevant forfurther data processing are not stored.

The differentiation between relevant and irrelevant states is to beillustrated using the following example, although for disclosurepurposes, reference is made in particular to the discussion in DE 101 39170.6.

The state information of a comparison is essential for furtherprocessing of data, for example, because it determines the functions tobe executed.

A sequential divider is formed, for example, by mapping a divisioninstruction onto hardware that supports only sequential division. Thisresults in a state which characterizes the computation step withindivision. This state is irrelevant because the algorithm needs only theresult (i.e., the division performed). Therefore,.in this case, only theresults and the time information (i.e., the availability) are needed.

The time information is available from the RDY/ACK handshake in the VPUtechnology according to P 44 16 881.0-53, DE 196 51 075.9-53 and DE 19926 538.0, for example. However, it should be pointed out here inparticular that the handshake itself likewise does not constitute arelevant state because it merely signals the validity of the data, sothat the remaining relevant information is in turn reduced to theexistence of valid data.

DE 101 39 170.6 shows a differentiation between locally relevant statesand globally relevant states:

-   Local: The state is relevant only within a single closed    configuration. Therefore, this state need not necessarily be stored.-   Global: The state information is needed for a plurality of    configurations. This state must be stored.

It is possible that the programmer might want to debug a locallyrelevant state that is not stored in the memories. In this case, theapplication may be modified to create a debug configuration (equivalentto the debug code of processors), having a modification of the “normal”code of the application so that this state is additionally written intothe memory unit and is therefore made available to the debugger. Thisresults in a deviation between the debug code and the actual code whichmay result in a difference in the performance of the codes.

In a particularly preferred embodiment, no debugging configuration isused. Instead, the configuration to be debugged is terminated so thatthe data additionally required for debugging purposes outlasts thetermination, i.e., it remains valid in the corresponding memorylocations (REGs) (e.g., registers, counters, memories).

If the configuration to be debugged is terminated in such a way that thedata additionally required for debugging purposes outlasts thetermination, it is possible to perform debugging easily by not loadingthe next configuration required in a normal program sequence, butloading instead a configuration through which the data required fordebugging purposes is transmitted to the debugging unit, i.e., thedebugging means. It should be pointed out that in such debugging, thedata required for debugging purposes may always be stored even later inthe program run, thereby ensuring that the program which has beenexecuted later has been subject to a debugging process in exactly thesame way as required. Normal program execution may continue afterreadout of the debug information by a dedicated debugging configuration.

A configuration is loaded which connects the REGs in a suitable mannerand in a defined order to one or more global memories to which the DBhas access (e.g., working memories).

It is thus proposed that a configuration is loaded which connects theREGs in a suitable manner and in a defined order to one or more globalmemories to which the DB has access (e.g., working memories).

The configuration may use address generators, for example, to which theglobal memory (memories) has/have access. The configuration may useaddress generators, for example, to access REGs designed as memories.According to the configured connection between the REGs, the contents ofthe REGs are written in a defined order into the global memory, theparticular addresses being predetermined by address generators. Theaddress generator generates the addresses for the global memory(memories) in such a way that the described memory areas (DEBUGINFO) maybe unambiguously assigned to the remote configuration to be debugged.

This method corresponds to the context switch described in DE 102 06653.1 and DE 101 39 170.6, the full content of which is incorporatedhere for disclosure purposes.

The DB may then access data within a memory area (DEBUGINFO) which isaccessible to it. If debugging is to be performed by a single-stepmethod, a context switch may be performed after each single step of aconfiguration to be debugged, so that all data is preserved and theinformation to be debugged is written out of the REGs and into a workingmemory. While preserving the data, the configuration to be debugged isthen reconfigured again and prepared for another single step. This isdone for each single step to be debugged of the configuration to bedebugged. Reference is made here to the possibility of debugging usingthe principles known as “wave reconfiguration.”

-   3.4 Visible Debug Information

Debugging before the (pre)condition may be performed easily and withoutany great loss of performance because the required debug information isavailable in working memories. The debug information may be secured in asimple manner by transferring the working memories to other memory areasto which the DB preferably has direct access. An even faster method isto switch the working memories by a bank switching method (according tothe related art) between the individual configurations so that the debuginformation is always in a new bank. This switching may take place in avery time-optimizing manner, in the optimum case even without any effecton the processing performance.

It has already been disclosed that in a VPU, data may be transferred byblocks into a memory area, which may also be located outside of theactual PA and/or may have a dual-ported RAM or the like, so that it isreadily possible to externally access the information thus written.

4. Mode of Operation of the Debugger

The debugger program itself may run on a DB outside of the PA. As analternative, a VPU itself may form the DB according to the methods usedwith processors. To do so, a task switch or context switch (SWITCH) maybe performed according to the description given in PACT11 (U.S.Published Application No. 2003-0056202). The debug information of theprogram to be debugged is saved together with the relevant data in aSWITCH and the debugger program, which analyzes the information and/orprocesses it interactively with the programmer, is loaded. AnotherSWITCH is then performed (in which the relevant information of thedebugger is saved) and the program to be debugged is continued. Itshould also be mentioned that a partial area of the processor may beprovided as a debugger.

The debug information is read by the debugger according to method Aand/or B and is saved in a memory and/or memory area that is separatefrom the data processing and to which the DB preferably has directaccess. The breakpoints and (pre)conditions are defined by the debuggerprogram. The debugger program may also assume control of execution ofthe application, in particular the start of execution and the end ofexecution.

The debugger makes a suitable working environment available to theprogrammer, optionally with a graphical interface. In a particularlypreferred embodiment, the debugger is integrated into a complexdevelopment environment with which it exchanges data and/or controlinformation. In particular, the debugger may save the data read out ofthe working memories on a data medium (hard drive, CD-ROM) for anyfurther processing and/or may run it within a network (such asEthernet).

The debugger according to the present invention may also communicatewith other tools and in particular other debuggers within a developmentenvironment described in DE 101 29 237.6-53. In a preferred embodiment,the control and/or definition of the debug parameters may be taken overfrom another debugger. Likewise, the debugger may make the debuginformation generated by it available to another debugger and/or mayreceive debug information from another debugger.

In particular, the determination of the occurrence of breakpoints and/ora (pre)condition may be implemented by another debugger and/or the unitsdebugged by this other debugger. The debugger according to the presentinvention and the VPU then respond accordingly.

The other debugger may be in particular the debugger of anotherprocessor (CT or ARC in Chameleon, Pentium, AMD, etc.) connected to aVPU.

In particular, the other debugger may run on a processor connected orassigned to the VPU and/or it may be the processor assigned to the DB,e.g., a CT or ARC in Chameleon. In a particularly preferred embodiment,the particular processor may be a host processor such as that describedin U.S. Patent Application Ser. No. 60/317,876 and/or DE 102 06 856.9,for example.

5. Evaluation of Methods

Method A is considerably more time- and resource-intensive than methodB, which requires hardly any additional hardware, and also omits thetime-consuming readout of debug information from the start of theapplication. Method B is therefore fundamentally preferable. Method B ispreferred for compilers described in DE 101 39 170.6 and its relatedapplications.

It has been recognized that in particular using methods A and B togetheryields the best and most transparent debugging results. In particular,depending on the depth of the error to be debugged, debugging may beperformed first with the help of the fast debugging method B and thenafter adequate localization of the error, debugging may be performed bymethod A, which analyzes the details in depth.

6. Mixed-mode Debugger

When using method B, which is particularly preferred, the problem mayalso occur that the visible information in the memories is insufficient.

Typically, detailed debugging may proceed as follows:

-   -   a) The visible debug information (PREINFO) before configuring a        breakpoint-containing configuration is saved. If an error occurs        in the breakpoint, a search is then conducted for visible debug        information (POSTINFO). Based on the PREINFO information, a        software simulator is started, simulating the configuration(s)        to be debugged. The simulator may determine each value within        the PAEs and the bus systems and output it (optionally also        graphically and/or as text), thus providing a detailed insight        into the sequence of the algorithm at the point in time when the        error occurred. It is possible in particular to compare the        simulated values in each case with the values from POSTINFO in        order to rapidly recognize any differences.    -   b) The visible debug information before a breakpoint is saved.        When a breakpoint occurs, a software visualizer is started based        on this information. The module to be debugged is then operated        in a single-step method to permit readout of all relevant data        according to method A. This data may then be output either        directly (including graphically and/or as text, if necessary)        and/or relayed to a simulator whose simulation is then based on        the more detailed data and may next be output in the known ways.

-   6.1 Advantages of a Mixed-mode Debugger

The mixed-mode debugger permits a detailed analysis of the sequenceswithin a module. Due to the possibility according to method B of workingat full speed up to a set breakpoint and then stopping, if necessary,slowing down and/or switching to a single-step mode, if necessary, thedebugging becomes time-efficient, so it becomes possible to test largevolumes of data and/or complex algorithms. The preferred use of asimulator after occurrence of the breakpoint on the basis of the currentdata and states permits detailed insight into the hardware. If the timerequired for the simulation is too long and/or a 100% correspondence ofthe simulator to the hardware is questionable, then reading back thedata in the single-step mode after occurrence of a breakpoint accordingto method A or according to the context switching method according to DE102 06 653.1 and DE 101 39 170.6 permits 100% correct debugging of thealgorithm and/or the hardware itself.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 b illustrates a representation of a finite automaton by areconfigurable architecture.

FIG. 2 illustrates a finite automaton mapped on a reconfigurablearchitecture

FIG. 3 shows a possible schematic structure of a debugging.

FIG. 4 a shows the structure of a particularly preferred VPU.

FIG. 4 b shows the detail of an exemplary CPU system.

FIG. 5 a shows an exemplary hardware design that may be used fordebugging reconfigurable processors.

FIG. 5 b shows as an example the expansion according to the presentinvention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

FIGS. 1 and 2 generally correspond to German Patent Application No. DE101 39 170.6. The different approaches of methods A and B are indicatedin the figures (A, B).

FIG. 1 b shows a representation of the finite automaton by areconfigurable architecture described in P 44 16 881.0-53 and DE 196 54846.2-53 (DE 196 54 846.2-53, FIGS. 12 through 15). The combinatorynetwork of FIG. 1 a (0101) is replaced by a system of PAEs 0107 (0101b). Register (0102) is embodied by a memory (0102 b) capable of storinga plurality of cycles. Feedback according to 0105 takes place through0105 b. Inputs (0103 b and 0104 b) are equivalent to 0103 and 0104,respectively. Direct access to 0102 b may be implemented through a busvia array 0101 b, if necessary. Output 0106 b is in turn equivalent to0106.

FIG. 2 shows an illustration of a finite automaton mapped on areconfigurable architecture, 0201(x) representing the combinatorynetwork (which may be embodied as a PAE according to FIG. 1 b). Thereare one or more memories for operands (0202) and one or more memoriesfor results (0203). Additional data inputs/outputs (0103 b, 0104 b, 0106b) are not shown for the sake of simplicity. An address generator (0204,0205) is assigned to each memory.

Operand and result memories (0202, 0203) are linked together physicallyor virtually so that the results of a function may be used as theoperands of another memory and/or results and operands of a function mayalso be used as the operands of another memory. Such a linkage may beestablished through bus systems, for example, or via (re)configurationwhereby the function and interconnection of the memories with 0201 arereconfigured.

FIG. 3 shows a possible schematic structure of the debugging accordingto method B. Reference should be made in particular to FIGS. 19, 20 and21 of German Patent Application No. DE 199 26 538.0 in which the basisof the memories is described. The full content of DE 199 26 538.0 isherewith incorporated for disclosure purposes.

0101 b and 0102 b are shown as already described. In addition, anexternal memory unit (0302) is also shown which may be connected (0307)to 0102 b, as in DE 199 26 538.0. Both 0102 b and 0302 may be externalor internal memory units. Likewise, one memory unit should be defined asat least one register, a set of registers or a memory (RAM, flash, etc.)or a bulk memory (hard drive, tape, etc.).

Debugging unit 0301 may set breakpoints within 0101 b (0303) on thebasis of which the actual debugging operation is triggered. On reachinga breakpoint, information (0304) is sent to 0301, starting the debuggingoperation. At the same time, all procedures for debugging (e.g.,stopping and/or slowing down the cycle) within 0101 b are triggered. Asan alternative, information may also be generated through 0301 and sentto 0101 b. Via 0305 and/or 0306, it is possible for 0301 to access thedata or states from memory 0102 b and/or memory 0302. The access maytake place, for example,

-   -   1. via memory linkage (block move, i.e., copying the memory into        another area controlled by 0301),    -   2. via a line (serial or parallel line over which one or more        memory areas are transmitted, e.g, JTAG),    -   3. via bus linkages, regardless of the type (the memories are        arbitrated as in a DMA method and are processed by 0301).

A figure from DE 199 26 538.0 has been selected as an example. It shouldbe pointed out explicitly that generally any memory and any memorylinkage (stack, random access, FIFO, etc.) may be processed accordingly.

FIGS. 4 a and 4 b show other possible embodiments; these have beendescribed in German Patent Application No. DE 102 06 856.9, the fullcontent of which is herewith included for disclosure purposes.

FIG. 4 a shows the structure of a particularly preferred VPU. Preferablyhierarchical configuration managers (CTs) (0401) control and manage asystem of reconfigurable elements (PACs) (0402). The CTs are assigned alocal memory for configurations (0403). The memory also has an interface(0404) to a global memory which provides the configuration data. Theconfiguration sequences are controllable via an interface (0405). Aninterface of reconfigurable elements (0402) for sequence control andevent management (0406) is provided; likewise there is an interface fordata exchange (0407). For example, one CT may function as a DB.

FIG. 4 b shows a detail of an exemplary CPU system, e.g., a DSP of theC6000 type from Texas Instruments (0451). This shows program memory(0452), data memory (0453), any peripheral (0454) and EMIF (0455). A VPUis integrated as coprocessor (0458) via a memory bus (0456) and aperipheral bus (0457). A DMA controller (EDMA) (0459) may perform anyDMA transfers, e.g., between memory (0453) and VPU (0458) or memory(0453) and periphery (0454). In this example, 0451 may function as a DBand in particular the debugger according to the present invention mayalso be connected to and/or integrated into its debugger.

FIG. 5 a shows an exemplary hardware design that may be used fordebugging reconfigurable processors. A pipelined configuration bus 0501like that described in DE 100 28 397.7 is used for this purpose. Thepipeline is composed of a plurality of register stages (0502) in thehorizontal and/or vertical direction to achieve higher clock pulsefrequencies. The pipelined configuration bus is connected to configuringelements (PAEs) (0503) to supply them with configuration data.

FIG. 5 b shows as an example the expansion according to the presentinvention. Each register stage (0502) decrements the numerical value(LATVAL) by one (indicated by −1) to compensate for the latency time.Likewise, each PAE (0503), which has already received clock controlinformation, decrements it by one per cycle (indicated by −1/T). It isthen possible to have not only write access but also read access to thePAEs and in particular to their internal registers, e.g., via a specialcontrol line (RD) to read out debug data. In this example, data to beread and written passes through the bus system through the arrays ofPAEs from left to right and in the reverse direction in the bottom row.The configuration bus is also connected back (0504) like a pipeline viaregister stages (0505). In this example, a higher-level unit (CR/loadlogic, host processor) (0506) may also have read and write access to thebus like a dedicated test interface (0507). The test interface may haveits own test controller and in particular may be compatible with one ormore test interfaces available on the market (e.g., JTAG, Tektronix,Rhode & Schwarz, etc.). The choice of the bus controlling unit is madevia a multiplexer/demultiplexer unit (0508). A circuit forback-calculating the source address (0509) of debug data arriving via0504 may be provided in 0509 (shown in parentheses and in italics) orupstream from units 0506 and 0507. The address calculations within thesystem shown here are performed as follows: first, the address isapplied to bus 0501 through 0506 or 0507. Like the processing ofnumerical values (LATVAL) for the latency computation, the address isdecremented in each register stage (0502 and 0505). As soon as theaddress is equal to 0, the PAE after the register stage is selected. Inthe following register stage the address becomes negative so that noother PAEs are activated. If data is read out of a PAE, it istransmitted again together with the address. The address is decrementedfurther in each register stage. A reverse calculation in 0509 of theaddresses arriving at 0506 and/or 0507 together with the debugging datais now possible via a simple addition, by adding the number ofdecrementing register stages to the incoming address value. It should bepointed out that register stages 0502 in FIG. 5 b are designed to beeasily distinguishable from register stages 0502 in FIG. 5 a.Namely, inFIG. 5 b, they additionally have a circuit (e.g., multiplexer) forselecting the data to be relayed, either forwarding the data of bus 0501or forwarding the output of the particular PAE (0503) and thus thedebugging data. The arrival of the address value equal to 0 may be usedto trigger the circuit.

It is pointed out here again that dedicated test interface (0507)conforms to industry standards. It may be used for tests during thesoftware debugging procedure and/or for testing during the assembly ofhardware components and systems (e.g., assembling circuits on a circuitboard) and/or for function tests of the semiconductor module (chip) aspart of semiconductor fabrication. In particular, the usual scan chainmay be omitted here for testing the register during the function test ofthe semiconductor or it may at least be minimized because then only theregisters that are not triggerable by the bus system (0501) need passthrough the scan chain.

Likewise, it is pointed out in particular that the method explained inconjunction with FIG. 5 is by no means limited to use with configurationbuses. Ordinary data bus systems may also be used at the different testtimes and debugging times and types of test and debugging listedpreviously. In particular, reference should be made in this connectionto the data bus system in DE 197 04 742.4. DE 197 04 742.4 is herewithincorporated fully for disclosure purposes. The methods described inconjunction with FIG. 5, easily understandable for an engineer havingordinary technical expertise, may also be applied to DE 197 04 742.4.

Mixed operation of different bus systems such as configuration bussystems, data bus systems according to DE 197 04 742.4 and ordinary databus systems is also fundamentally possible. Therefore a plurality oftest interfaces may be provided or (and this option is technicallypreferable) multiplexer/ demultiplexer stage (0508) may be designed fora plurality of bus systems (n×0501, n×0504).

In conclusion, it should also be mentioned in particular that byconnecting back the bus system according to FIG. 5 b, the configurationdata which is also to be written into the PAEs is also returned. Use ofthe configuration buffer memory FIFOs according to DE 100 28 397.7(FIGS. 8 and 9 (0805, 0903)) may be omitted with the help of the addressback-calculation (0509) and returned status line REJ, which indicatesrejection of the configuration according to DE 100 28 397.7, DE 198 07872.2, DE 196 54 593.5-53 because their functionality is now mappedcompletely via the bus system described here.

8. Definition of Terms

-   Locally relevant state State that is relevant only within a certain    configuration.-   Globally relevant state State that is relevant in a plurality of    configurations and must be exchanged among the configurations.-   Relevant state State that is needed within an algorithm for correct    execution thereof and is thus described and used by the algorithm.-   Irrelevant state State that is of no significance for the actual    algorithm and is also not described in the algorithm but is needed    by the hardware performing the execution as a function of the    implementation.

1. A method for debugging a program running on hardware includingmodules that are reconfigurable in a configuration cycle with respect toat least one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information; andduring the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed;wherein the reading out of the memory the debug information is performedusing the configuration.
 2. The method as recited in claim 1, furthercomprising: performing a cycle process of a configuration to bedebugged, step by step.
 3. The method as recited in claim 2, furthercomprising: simulating a configuration to be debugged according toreadout of relevant information or according to previously availableinformation.
 4. The method as recited in claim 1, further comprising:simulating a configuration to be debugged according to readout ofrelevant information or according to previously available information.5. A method for debugging a program running on hardware includingmodules that are reconfigurable in a configuration cycle with respect toat least one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; and writing the debug information into adebugging unit or a debugging configuration.
 6. The method as recited inclaim 5, further comprising: simulating a configuration to be debuggedaccording to readout of relevant information or according to previouslyavailable information.
 7. A system, comprising: a hardware includingmodules that are reconfigurable in a configuration cycle with respect toat least one of function and arithmetic units interconnection; and adebugging arrangement to debug a program while running on the hardware,wherein the debugging arrangement includes a memory to store debuginformation wherein: in each of at least a subset of a plurality ofconfiguration cycles (a) performed during the running of the program onthe hardware, (b) during which at least a subset of the reconfigurablehardware modules are reconfigured with respect to at least one offunction and interconnection, and (c) for which the debug information isto be obtained, the debugging arrangement is configured to: write thedebug information into to the memory; and read the debug information outof the memory for use by the debugging arrangement: the debuggingarrangement analyzes the debug informatiom during the running of theprogram, a configuration is loaded during the debugging after occurrenceof a debugging condition according to which information regarding theconfiguration to be debugged is needeth and the reading of the debuginformation out of the memory is performed using the configuration. 8.The system as recited in claim 7, wherein the memory is a dual-portedRAM having a first input for information to be saved from the field anda second input for readout of information into an analysis device.
 9. Amethod for debugging a program running on hardware including modulesthat are reconfigurable in a configuration cycle with respect to atleast one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; and altering a configuration to be debuggedbefore the debugging in such a way that information not needed in normalnon-debugging execution is stored in a memory.
 10. Method as recited inclaim 9, further comprising: writing the debug information into adebugging unit or a debugging configuration.
 11. The method as recitedin claim 10, further comprising: simulating a configuration to bedebugged according to readout of relevant information or according topreviously available information.
 12. The method as recited in claim 9,further comprising: simulating a configuration to be debugged accordingto readout of relevant information or according to previously availableinformation.
 13. A method for debugging a program running on hardwareincluding modules that are reconfigurable in a configuration cycle withrespect to at least one of function and interconnection, comprising: ineach of at least a subset of a plurality of configuration cyclesperformed during the running of the program and for which debuginformation is to be obtained, at least a subset of the reconfigurablehardware modules being reconfigured in each of the configuration cycleswith respect to the at least one of function and interconnection:writing debug information into a memory; and reading out of the memorythe debug information for use by a debugger; analyzing by the debuggerthe debug information; during the running of the program, loading aconfiguration during the debugging after occurrence of a debuggingcondition according to which information regarding the configuration tobe debugged is needed, wherein the reading out of the memory the debuginformation is performed using the configuration; and at least partiallyslowing down or stopping a clock pulse frequency for readout.
 14. Themethod as recited in claim 5, further comprising: at least partiallyslowing down or stopping a clock pulse frequency for readout.
 15. Themethod as recited in claim 14, further comprising: simulating aconfiguration to be debugged according to readout of relevantinformation or according to previously available information.
 16. Themethod as recited in claim 13, further comprising: simulating aconfiguration to be debugged according to readout of relevantinformation or according to previously available information.
 17. Amethod for debugging a program running on hardware including modulesthat are reconfigurable in a configuration cycle with respect to atleast one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; altering a configuration to be debugged beforethe debugging in such a way that information not needed in normalnon-debugging execution is stored in a memory; and at least partiallyslowing down or stopping a clock pulse frequency for readout.
 18. Themethod as recited in claim 17, further comprising: simulating aconfiguration to be debugged according to readout of relevantinformation or according to previously available information.
 19. Amethod for debugging a program running on hardware including modulesthat are reconfigurable in a configuration cycle with respect to atleast one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; writing the debug information into a debuggingunit or a debugging configuration; altering a configuration to bedebugged before the debugging in such a way that information not neededin normal non-debugging execution is stored in a memory; and at leastpartially slowing down or stopping a clock pulse frequency for readout.20. The method as recited in claim 19, further comprising: simulating aconfiguration to be debugged according to readout of relevantinformation or according to previously available information.
 21. Amethod for debugging a program running on hardware including modulesthat are reconfigurable in a configuration cycle with respect to atleast one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; at least partially slowing down or stopping aclock pulse frequency for readout; and performing a cycle process of aconfiguration to be debugged, step by step.
 22. The method as recited inclaim 21, further comprising: simulating a configuration to be debuggedaccording to readout of relevant information or according to previouslyavailable information.
 23. A method for debugging a program running onhardware including modules that are reconfigurable in a configurationcycle with respect to at least one of function and interconnection,comprising: in each of at least a subset of a plurality of configurationcycles performed during the running of the program and for which debuginformation is to be obtained, at least a subset of the reconfigurablehardware modules being reconfigured in each of the configuration cycleswith respect to the at least one of function and interconnection:writing debug information into a memory; and reading out of the memorythe debug information for use by a debugger; analyzing by the debuggerthe debug information; during the running of the program, loading aconfiguration during the debugging after occurrence of a debuggingcondition according to which information regarding the configuration tobe debugged is needed, wherein the reading out of the memory the debuginformation is performed using the configuration; writing the debuginformation into a debugging unit or a debugging configuration; at leastpartially slowing down or stopping a clock pulse frequency for readout;and performing a cycle process of a configuration to be debugged, stepby step.
 24. The method as recited in claim 23, further comprising:simulating a configuration to be debugged according to readout ofrelevant information or according to previously available information.25. A method for debugging a program running on hardware includingmodules that are reconfigurable in a configuration cycle with respect toat least one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; altering a configuration to be debugged beforethe debugging in such a way that information not needed in normalnon-debugging execution is stored in a memory; at least partiallyslowing down or stopping a clock pulse frequency for readout; andperforming a cycle process of a configuration to be debugged, step bystep.
 26. The method as recited in claim 25, further comprising:simulating a configuration to be debugged according to readout ofrelevant information or according to previously available information.27. A method for debugging a program running on hardware includingmodules that are reconfigurable in a configuration cycle with respect toat least one of function and interconnection, comprising: in each of atleast a subset of a plurality of configuration cycles performed duringthe running of the program and for which debug information is to beobtained, at least a subset of the reconfigurable hardware modules beingreconfigured in each of the configuration cycles with respect to the atleast one of function and interconnection: writing debug informationinto a memory; and reading out of the memory the debug information foruse by a debugger; analyzing by the debugger the debug information;during the running of the program, loading a configuration during thedebugging after occurrence of a debugging condition according to whichinformation regarding the configuration to be debugged is needed,wherein the reading out of the memory the debug information is performedusing the configuration; writing the debug information into a debuggingunit or a debugging configuration; altering a configuration to bedebugged before the debugging in such a way that information not neededin normal non-debugging execution is stored in a memory; at leastpartially slowing down or stopping a clock pulse frequency for readout;and performing a cycle process of a configuration to be debugged, stepby step.
 28. The method as recited in claim 27, further comprising:simulating a configuration to be debugged according to readout ofrelevant information or according to previously available information.